-
-
Notifications
You must be signed in to change notification settings - Fork 820
feat(engine): Improve execution stalls troubleshooting, align dev and prod behavior, adding heartbeats.yield utility #2489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 1706618 The changes in this PR will be included in the next version bump. This PR includes changesets to release 23 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Warning Rate limit exceeded@ericallam has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 6 minutes and 35 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughAdds a heartbeats subsystem (HeartbeatsAPI, HeartbeatsManager interface, StandardHeartbeatsManager) in core, exposes it via v3 exports and the SDK (yield, getLastHeartbeat), and replaces interval-based worker heartbeats with event-driven heartbeats in CLI workers. Run-engine changes: new option treatProductionExecutionStallsAsOOM, runStatusFromError now requires environmentType and maps stall errors differently, increased default execution timeouts, guards to prevent operating on FINISHED snapshots, and permanently-fail flows updated to use latest snapshot objects. Documentation updates: troubleshooting for stalled executions and expanded machines/ResourceMonitor guidance. Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Pre-merge checks (1 passed, 1 warning, 1 inconclusive)❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/core/src/v3/errors.ts (1)
306-310
: Typo in error code literal: “TASK_HAS_N0_EXECUTION_SNAPSHOT”.The “0” digit will never match the real code (“NO”), breaking classification and falling into
assertExhaustive
. Fix the literal.Apply:
- case "TASK_HAS_N0_EXECUTION_SNAPSHOT": + case "TASK_HAS_NO_EXECUTION_SNAPSHOT":internal-packages/run-engine/src/engine/errors.ts (1)
55-55
: Typo in error code literal: “TASK_HAS_N0_EXECUTION_SNAPSHOT”.Same issue as in core errors; will never match the real code.
Apply:
- case "TASK_HAS_N0_EXECUTION_SNAPSHOT": + case "TASK_HAS_NO_EXECUTION_SNAPSHOT":
🧹 Nitpick comments (19)
apps/webapp/app/env.server.ts (1)
522-523
: 10-minute stall timeout defaults: verify operational alignmentConfirm dashboards/alarms and docs consistently reflect 10 minutes, and that staging/preview have overrides if needed.
packages/cli-v3/src/executions/taskRunProcess.ts (3)
112-116
: Don't silently swallow cancel errorsSwallowing here hides unexpected failures to deliver CANCEL. Log at debug.
try { await this.#cancel(); - } catch (err) {} + } catch (err) { + logger.debug("cancel: failed to send CANCEL to child", { err, pid: this.pid }); + }
126-127
: Capture and log flush errors during cleanupKeep cleanup resilient but log failures for diagnosis.
- await tryCatch(this.#flush()); + { + const [err] = await tryCatch(this.#flush()); + if (err) logger.debug("cleanup: flush failed", { err, pid: this.pid }); + }
442-443
: Capture and log flush errors during suspendSame rationale as cleanup.
- await tryCatch(this.#flush()); + { + const [err] = await tryCatch(this.#flush()); + if (err) logger.debug("suspend: flush failed", { err, pid: this.pid }); + }docs/troubleshooting.mdx (1)
184-215
: Tiny enhancement: mention env knobs for the 10‑min timeoutConsider adding: “You can configure the stall timeout via RUN_ENGINE_TIMEOUT_EXECUTING and RUN_ENGINE_TIMEOUT_EXECUTING_WITH_WAITPOINTS.”
packages/cli-v3/src/entryPoints/dev-run-controller.ts (2)
360-369
: FINISHED handling: detach handlers before suspend (reduce race window)Small window where a heartbeat could still fire. Detach first.
- await this.exitTaskRunProcessWithoutFailingRun({ + this.taskRunProcess?.unsafeDetachEvtHandlers(); + await this.exitTaskRunProcessWithoutFailingRun({ flush: true, reason: "already-finished", });
620-643
: Don’t await heartbeat I/O in the event handlerFire-and-forget to avoid backpressure if multiple heartbeats cluster.
- const response = await this.httpClient.dev.heartbeatRun( + void this.httpClient.dev.heartbeatRun( this.runFriendlyId, this.snapshotFriendlyId, { cpu: 0, memory: 0, } - ); - - if (!response.success) { - logger.debug("[DevRunController] Heartbeat failed", { error: response.error }); - } + ).then((response) => { + if (!response.success) { + logger.debug("[DevRunController] Heartbeat failed", { error: response.error }); + } + });packages/core/src/v3/heartbeats/types.ts (1)
1-8
: Prefertype
alias overinterface
per repo guidelines.Functionally equivalent here and aligns with “we usually use types over interfaces”.
Apply:
-export interface HeartbeatsManager { - startHeartbeat(id: string): void; - stopHeartbeat(): void; - yield(): Promise<void>; - reset(): void; - - get lastHeartbeat(): Date | undefined; -} +export type HeartbeatsManager = { + startHeartbeat(id: string): void; + stopHeartbeat(): void; + yield(): Promise<void>; + reset(): void; + readonly lastHeartbeat: Date | undefined; +};internal-packages/run-engine/src/engine/errors.ts (1)
29-34
: Avoid magic string comparison for environment.Use the exported constant if available, or a typed constant in-file to prevent typos.
Apply:
+const DEV: RuntimeEnvironmentType = "DEVELOPMENT"; - if (environmentType === "DEVELOPMENT") { + if (environmentType === DEV) { return "CANCELED"; }packages/trigger-sdk/src/v3/heartbeats.ts (1)
37-39
: Naming nit: preferlastHeartbeat
getter to mirror coreTo keep SDK and core consistent, consider exporting a getter-style property rather than a
getLastHeartbeat()
function, e.g.,lastHeartbeat(): Date | undefined
or re-exporting the value via a getter. Not critical.-export const heartbeats = { - yield: heartbeatsYield, - getLastHeartbeat: heartbeatsGetLastHeartbeat, -}; +export const heartbeats = { + yield: heartbeatsYield, + get lastHeartbeat() { + return heartbeatsGetLastHeartbeat(); + }, +};packages/cli-v3/src/entryPoints/managed-run-worker.ts (1)
33-34
: ImportattemptKey
to align heartbeat IDs with dev workerIf we standardize on attempt-level IDs, bring in
attemptKey
.traceContext, + attemptKey, heartbeats, } from "@trigger.dev/core/v3";
internal-packages/run-engine/src/engine/index.ts (2)
216-217
: Use Logger instead of console.log for startup messagePrefer the project Logger for consistency and log routing.
- console.log("✅ Starting run engine worker"); + this.logger.info("✅ Starting run engine worker");
1253-1265
: Avoid hardcoding “sent every 30s” in stall messagesHeartbeat cadence may vary across environments. Remove the fixed interval or derive it from config.
- ? `Run timed out after ${timeoutDuration} due to missing heartbeats (sent every 30s). Check if your \`trigger.dev dev\` CLI is still running, or if CPU-heavy work is blocking the main thread.` - : `Run timed out after ${timeoutDuration} due to missing heartbeats (sent every 30s). This typically happens when CPU-heavy work blocks the main thread.`; + ? `Run timed out after ${timeoutDuration} due to missing heartbeats. Check if your \`trigger.dev dev\` CLI is still running, or if CPU-heavy work is blocking the main thread.` + : `Run timed out after ${timeoutDuration} due to missing heartbeats. This typically happens when CPU-heavy work blocks the main thread.`;packages/core/src/v3/heartbeats/api.ts (2)
41-43
: Allow optional override when registering a global managerUseful for tests/dev hot-reload; keeps API explicit.
- public setGlobalManager(manager: HeartbeatsManager): boolean { - return registerGlobal(API_NAME, manager); - } + public setGlobalManager(manager: HeartbeatsManager, allowOverride = false): boolean { + return registerGlobal(API_NAME as any, manager, allowOverride); + }
58-65
: Annotate return types and avoid returning values for void methodsClarifies API shape and prevents accidental Promise propagation if implementations change.
- public startHeartbeat(id: string) { - return this.#getManager().startHeartbeat(id); - } + public startHeartbeat(id: string): void { + this.#getManager().startHeartbeat(id); + } - public stopHeartbeat() { - return this.#getManager().stopHeartbeat(); - } + public stopHeartbeat(): void { + this.#getManager().stopHeartbeat(); + }packages/core/src/v3/heartbeats/manager.ts (4)
22-28
: Yield without adding artificial latency24ms adds avoidable delay; a 0ms timeout is sufficient to yield cooperatively.
- // Only call setImmediate if we haven't yielded in the last interval + // Only yield if we haven't yielded in the last interval if (Date.now() - this.lastHeartbeatYieldTime >= this.intervalInMs) { - // await setImmediate(); - await setTimeout(24); + await new Promise<void>((r) => setTimeout(r, 0)); this.lastHeartbeatYieldTime = Date.now(); }
45-50
: Only mark lastHeartbeat on success and preserve error detailsKeep lastHeartbeat meaningful (successful send) and log the error object to retain stack/cause.
- const [error] = await tryCatch(this.listener(id)); - this.lastHeartbeatDate = new Date(); - - if (error) { - console.error("Failed to send HEARTBEAT message", { error: String(error) }); - } + const [error] = await tryCatch(this.listener(id)); + if (error) { + console.error("Failed to send HEARTBEAT message", { error }); + } else { + this.lastHeartbeatDate = new Date(); + }Would you like lastHeartbeat to represent “last attempted” instead? If so, keep the original semantics and just adjust the logger.
59-61
: Clear state on stop to avoid stale controller and gatingOptional, but makes stopHeartbeat fully idempotent and resets yield throttling until restarted.
stopHeartbeat(): void { - this.currentAbortController?.abort(); + this.currentAbortController?.abort(); + this.currentAbortController = undefined; + this.lastHeartbeatYieldTime = undefined; }
11-16
: Consider returning an unsubscribe from registerListenerLow-priority: returning a disposer helps tests swap listeners safely, even if you keep a single global callback at runtime.
- registerListener(callback: (id: string) => Promise<void>) { - this.listener = callback; - } + registerListener(callback: (id: string) => Promise<void>): () => void { + this.listener = callback; + return () => { + if (this.listener === callback) this.listener = undefined; + }; + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
references/hello-world/src/trigger/cpuHeavy.ts
is excluded by!references/**
📒 Files selected for processing (20)
apps/webapp/app/env.server.ts
(1 hunks)docs/troubleshooting.mdx
(1 hunks)internal-packages/run-engine/src/engine/errors.ts
(2 hunks)internal-packages/run-engine/src/engine/index.ts
(4 hunks)internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
(7 hunks)packages/cli-v3/src/entryPoints/dev-run-controller.ts
(8 hunks)packages/cli-v3/src/entryPoints/dev-run-worker.ts
(7 hunks)packages/cli-v3/src/entryPoints/managed-run-worker.ts
(6 hunks)packages/cli-v3/src/executions/taskRunProcess.ts
(3 hunks)packages/core/src/v3/errors.ts
(2 hunks)packages/core/src/v3/heartbeats-api.ts
(1 hunks)packages/core/src/v3/heartbeats/api.ts
(1 hunks)packages/core/src/v3/heartbeats/manager.ts
(1 hunks)packages/core/src/v3/heartbeats/types.ts
(1 hunks)packages/core/src/v3/index.ts
(1 hunks)packages/core/src/v3/links.ts
(1 hunks)packages/core/src/v3/utils/globals.ts
(2 hunks)packages/core/src/v3/workers/index.ts
(1 hunks)packages/trigger-sdk/src/v3/heartbeats.ts
(1 hunks)packages/trigger-sdk/src/v3/index.ts
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}
: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations
Files:
packages/core/src/v3/heartbeats-api.ts
packages/core/src/v3/index.ts
packages/core/src/v3/workers/index.ts
packages/core/src/v3/errors.ts
apps/webapp/app/env.server.ts
packages/core/src/v3/utils/globals.ts
packages/trigger-sdk/src/v3/heartbeats.ts
packages/core/src/v3/links.ts
packages/cli-v3/src/executions/taskRunProcess.ts
packages/core/src/v3/heartbeats/types.ts
packages/trigger-sdk/src/v3/index.ts
packages/cli-v3/src/entryPoints/dev-run-worker.ts
internal-packages/run-engine/src/engine/index.ts
packages/core/src/v3/heartbeats/api.ts
internal-packages/run-engine/src/engine/errors.ts
packages/core/src/v3/heartbeats/manager.ts
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
packages/cli-v3/src/entryPoints/managed-run-worker.ts
packages/cli-v3/src/entryPoints/dev-run-controller.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
We use zod a lot in packages/core and in the webapp
Files:
packages/core/src/v3/heartbeats-api.ts
packages/core/src/v3/index.ts
packages/core/src/v3/workers/index.ts
packages/core/src/v3/errors.ts
apps/webapp/app/env.server.ts
packages/core/src/v3/utils/globals.ts
packages/core/src/v3/links.ts
packages/core/src/v3/heartbeats/types.ts
packages/core/src/v3/heartbeats/api.ts
packages/core/src/v3/heartbeats/manager.ts
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
When importing from @trigger.dev/core in the webapp, never import the root package path; always use one of the documented subpath exports from @trigger.dev/core’s package.json
Files:
apps/webapp/app/env.server.ts
{apps/webapp/app/**/*.server.{ts,tsx},apps/webapp/app/routes/**/*.ts}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Access environment variables only via the env export from app/env.server.ts; do not reference process.env directly
Files:
apps/webapp/app/env.server.ts
apps/webapp/app/**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Modules intended for test consumption under apps/webapp/app/**/*.ts must not read environment variables; accept configuration via options instead
Files:
apps/webapp/app/env.server.ts
🧠 Learnings (6)
📚 Learning: 2024-10-18T15:41:52.352Z
Learnt from: nicktrn
PR: triggerdotdev/trigger.dev#1418
File: packages/core/src/v3/errors.ts:364-371
Timestamp: 2024-10-18T15:41:52.352Z
Learning: In `packages/core/src/v3/errors.ts`, within the `taskRunErrorEnhancer` function, `error.message` is always defined, so it's safe to directly call `error.message.includes("SIGTERM")` without additional checks.
Applied to files:
packages/core/src/v3/errors.ts
📚 Learning: 2025-08-18T10:07:17.368Z
Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-08-18T10:07:17.368Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Import Trigger.dev APIs from "trigger.dev/sdk/v3" when writing tasks or related utilities
Applied to files:
packages/trigger-sdk/src/v3/heartbeats.ts
packages/trigger-sdk/src/v3/index.ts
packages/cli-v3/src/entryPoints/dev-run-worker.ts
packages/cli-v3/src/entryPoints/managed-run-worker.ts
📚 Learning: 2025-08-29T10:06:49.293Z
Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-08-29T10:06:49.293Z
Learning: Applies to apps/webapp/**/*.{ts,tsx} : When importing from trigger.dev/core in the webapp, never import the root package path; always use one of the documented subpath exports from trigger.dev/core’s package.json
Applied to files:
packages/trigger-sdk/src/v3/index.ts
📚 Learning: 2025-08-18T10:07:17.368Z
Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-08-18T10:07:17.368Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Define tasks using task({ id, run, ... }) with a unique id per project
Applied to files:
docs/troubleshooting.mdx
📚 Learning: 2025-08-18T10:07:17.368Z
Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-08-18T10:07:17.368Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : For idempotent child-task invocations, create and pass idempotencyKey (and optional TTL) when calling trigger()/batchTrigger() from tasks
Applied to files:
docs/troubleshooting.mdx
📚 Learning: 2025-08-18T10:07:17.368Z
Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-08-18T10:07:17.368Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use triggerAndWait() only from within a task context (not from generic app code) and handle result.ok or use unwrap() with error handling
Applied to files:
docs/troubleshooting.mdx
🧬 Code graph analysis (14)
packages/core/src/v3/heartbeats-api.ts (2)
packages/trigger-sdk/src/v3/heartbeats.ts (1)
heartbeats
(41-44)packages/core/src/v3/heartbeats/api.ts (1)
HeartbeatsAPI
(28-73)
packages/core/src/v3/errors.ts (1)
packages/core/src/v3/links.ts (1)
links
(1-33)
packages/core/src/v3/utils/globals.ts (2)
packages/core/src/v3/heartbeats/api.ts (1)
HeartbeatsManager
(70-72)packages/core/src/v3/heartbeats/types.ts (1)
HeartbeatsManager
(1-8)
packages/trigger-sdk/src/v3/heartbeats.ts (1)
packages/core/src/v3/heartbeats-api.ts (1)
heartbeats
(5-5)
packages/cli-v3/src/executions/taskRunProcess.ts (1)
packages/core/src/utils.ts (1)
tryCatch
(5-18)
packages/core/src/v3/heartbeats/types.ts (2)
packages/core/src/v3/heartbeats/api.ts (1)
HeartbeatsManager
(70-72)internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (1)
id
(1582-1589)
packages/cli-v3/src/entryPoints/dev-run-worker.ts (3)
packages/core/src/v3/heartbeats/manager.ts (1)
StandardHeartbeatsManager
(5-77)packages/core/src/v3/heartbeats-api.ts (1)
heartbeats
(5-5)packages/core/src/v3/idempotencyKeys.ts (1)
attemptKey
(132-134)
internal-packages/run-engine/src/engine/index.ts (1)
packages/core/src/v3/index.ts (1)
formatDurationMilliseconds
(35-35)
packages/core/src/v3/heartbeats/api.ts (2)
packages/core/src/v3/heartbeats/types.ts (1)
HeartbeatsManager
(1-8)packages/core/src/v3/utils/globals.ts (3)
registerGlobal
(21-37)unregisterGlobal
(45-51)getGlobal
(39-43)
internal-packages/run-engine/src/engine/errors.ts (1)
packages/core/src/v3/schemas/common.ts (4)
TaskRunError
(200-205)TaskRunError
(207-207)RuntimeEnvironmentType
(547-552)RuntimeEnvironmentType
(554-555)
packages/core/src/v3/heartbeats/manager.ts (3)
packages/core/src/v3/heartbeats/api.ts (1)
HeartbeatsManager
(70-72)packages/core/src/v3/heartbeats/types.ts (1)
HeartbeatsManager
(1-8)internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (1)
id
(1582-1589)
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (2)
internal-packages/run-engine/src/engine/errors.ts (2)
ServiceValidationError
(69-78)runStatusFromError
(5-67)internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts (2)
EnhancedExecutionSnapshot
(20-25)status
(428-449)
packages/cli-v3/src/entryPoints/managed-run-worker.ts (4)
packages/core/src/v3/heartbeats/manager.ts (1)
StandardHeartbeatsManager
(5-77)packages/core/src/v3/heartbeats-api.ts (1)
heartbeats
(5-5)packages/trigger-sdk/src/v3/heartbeats.ts (1)
heartbeats
(41-44)internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (1)
id
(1582-1589)
packages/cli-v3/src/entryPoints/dev-run-controller.ts (1)
packages/core/src/v3/errors.ts (1)
SuspendedProcessError
(527-533)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
🔇 Additional comments (24)
packages/core/src/v3/links.ts (1)
17-18
: LGTM: new troubleshooting linkMatches the new doc anchor.
packages/cli-v3/src/entryPoints/dev-run-controller.ts (5)
7-7
: Import of SuspendedProcessErrorOK; used to special-case suspend exits.
528-533
: Suspend path handling looks correctGracefully exits without failing the run on suspend.
598-599
: Run attempt lifecycle flag initializationGood reset before execution.
670-671
: Marking completion state before returning processAppropriate to guard double-completion flows.
792-822
: Idempotent shutdown helper: good additionCentralizes teardown; prevents duplicate onFinished calls.
packages/core/src/v3/workers/index.ts (1)
32-32
: Good re-export; keeps workers surface coherent.No issues. This enables CLI workers to import the standard manager from a stable path.
packages/trigger-sdk/src/v3/index.ts (1)
18-18
: SDK now surfaces heartbeats — looks correct.Additive, no breaking changes. Please ensure docs/examples reference
import { heartbeats } from "trigger.dev/sdk/v3"
.If helpful, I can open a docs follow-up PR to add usage examples. Want me to draft it?
packages/core/src/v3/heartbeats-api.ts (1)
1-5
: Singleton entrypoint is fine and side-effect free when unused.Naming aligns with other v3 “-api” modules; no default export; tree-shakeable. LGTM.
packages/core/src/v3/index.ts (1)
21-21
: Publicly exposing heartbeats API via v3 index is correct.Additive export; no ordering hazards observed.
packages/core/src/v3/utils/globals.ts (1)
3-3
: Key name consistency verified
TheAPI_NAME
constant ("heartbeats"
) inheartbeats/api.ts
matches the["heartbeats"]
slot inglobals.ts
. No changes needed.packages/core/src/v3/errors.ts (1)
285-347
: Confirm stalled-executing errors are intended to be retriable
internal-packages/run-engine/src/engine/retrying.ts and apps/webapp/app/v3/services/completeAttempt.server.ts both treatTASK_RUN_STALLED_EXECUTING
(and its wait-points variant) as retriable—mapping to CANCELED in DEV and triggering retries. Verify this behavior is desired to avoid excessive churn in DEV.internal-packages/run-engine/src/engine/errors.ts (1)
5-8
: Verified no stale 1-arg calls remain. All invocations of runStatusFromError now pass two arguments.packages/trigger-sdk/src/v3/heartbeats.ts (2)
30-32
: Good wrapper: awaitable yield is correctly delegatedThe wrapper cleanly forwards to the core API and keeps the public surface small.
41-44
: Public API surface looks goodMinimal, intuitive, aligns with docs.
packages/cli-v3/src/entryPoints/dev-run-worker.ts (2)
563-564
: Heartbeat lifecycle closed correctlyStopping the heartbeat in
finally
ensures no leaks between runs.
698-700
: Listener wiring is correct and awaits IPC sendThe await prevents message pile-up; manager already guards/logs failures internally.
packages/cli-v3/src/entryPoints/managed-run-worker.ts (2)
299-300
: Reset covers heartbeats as wellGood inclusion to ensure clean state before next run.
735-737
: Listener registration LGTMHeartbeats emit via IPC on schedule; matches core manager contract.
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (4)
694-696
: Good guard: prevent succeeding an already finished runThis avoids double-finishing edge cases under races.
851-853
: Good guard: prevent failing an already finished runSame rationale; this is a safe, defensive check.
933-941
: Passing latestSnapshot object through failure path is cleaner and correctUsing EnhancedExecutionSnapshot ensures correct previousSnapshotId and environment-aware status mapping.
Also applies to: 1452-1476, 1522-1522
873-914
: Confirm forceRequeue-driven runAttemptFailed emission semantics
forceRequeue always emits a runAttemptFailed event (runAttemptSystem.ts L873–914) even if the final outcome is fail_run or cancel_run; verify downstream analytics/UX won’t double-report or confuse terminal failure signals.packages/core/src/v3/heartbeats/manager.ts (1)
1-78
: Node-only contexts confirmed;node:timers/promises
imports are safe. StandardHeartbeatsManager is only instantiated in CLI entry points and server-side (.server.ts
) code with no client-side TSX or browser bundles, so no bundler breakage will occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (6)
packages/core/src/v3/heartbeats/api.ts (1)
6-24
: Resolved: Noop manager matches the interface nowstartHeartbeat/stopHeartbeat are sync and yield returns Promise. Looks good.
packages/cli-v3/src/entryPoints/managed-run-worker.ts (2)
533-534
: Prefer attempt-level heartbeat IDs for parity and cleaner lifecycleUse attemptKey(execution) if available so heartbeats tie to attempts, not runs.
- standardHeartbeatsManager.startHeartbeat(_execution.run.id); + // Prefer attempt-level ID to avoid collisions across retries + standardHeartbeatsManager.startHeartbeat(attemptKey(execution));To locate attemptKey in the repo (and confirm import path), run:
#!/bin/bash rg -nP -C2 '\battemptKey\s*\(' --type=ts
139-143
: Sanitize HEARTBEAT_INTERVAL_MS to avoid 0/NaN tight loopsClamp and default the interval before constructing the manager.
-const standardHeartbeatsManager = new StandardHeartbeatsManager( - parseInt(heartbeatIntervalMs ?? "30000", 10) -); +const intervalMs = (() => { + const n = Number(heartbeatIntervalMs); + return Number.isFinite(n) && n > 0 ? Math.max(250, Math.floor(n)) : 30_000; +})(); +const standardHeartbeatsManager = new StandardHeartbeatsManager(intervalMs); heartbeats.setGlobalManager(standardHeartbeatsManager);packages/core/src/v3/heartbeats/manager.ts (2)
1-4
: Avoid Node-only timers in core; make timers isomorphicnode:timers/promises breaks web/edge bundling. Replace with an abortable sleep helper and an async while loop.
import { tryCatch } from "../tryCatch.js"; import { HeartbeatsManager } from "./types.js"; -import { setInterval, setImmediate, setTimeout } from "node:timers/promises"; + +// Isomorphic helpers (work in Node, browser, and edge) +const delay = (ms: number) => + new Promise<void>((resolve) => globalThis.setTimeout(resolve, ms)); + +function abortableSleep(ms: number, signal: AbortSignal): Promise<void> { + return new Promise<void>((resolve, reject) => { + if (signal.aborted) return reject(Object.assign(new Error("Aborted"), { name: "AbortError" })); + const id = globalThis.setTimeout(() => { + signal.removeEventListener("abort", onAbort); + resolve(); + }, ms); + function onAbort() { + globalThis.clearTimeout(id); + reject(Object.assign(new Error("Aborted"), { name: "AbortError" })); + } + signal.addEventListener("abort", onAbort, { once: true }); + }); +}
40-58
: Refactor loop to isomorphic, abortable while-loop and only update lastHeartbeat on successRemoves Node dependency and avoids reporting lastHeartbeat when the send failed.
- private async startHeartbeatLoop(id: string, signal: AbortSignal) { - try { - for await (const _ of setInterval(this.intervalInMs, undefined, { - signal, - })) { - if (this.listener) { - const [error] = await tryCatch(this.listener(id)); - this.lastHeartbeatDate = new Date(); - - if (error) { - console.error("Failed to send HEARTBEAT message", { error: String(error) }); - } - } - } - } catch (error) { - // Ignore errors as we expect them to be thrown when the heartbeat is stopped - // And since we tryCatch inside the loop, we don't need to handle any other errors here - } - } + private async startHeartbeatLoop(id: string, signal: AbortSignal) { + try { + while (!signal.aborted) { + await abortableSleep(this.intervalInMs, signal); + if (signal.aborted) break; + if (this.listener) { + const [error] = await tryCatch(this.listener(id)); + if (error) { + console.error("Failed to send HEARTBEAT message", { error: String(error) }); + } else { + this.lastHeartbeatDate = new Date(); + } + } + } + } catch { + // Expected on abort; no-op. + } + }packages/cli-v3/src/entryPoints/dev-run-controller.ts (1)
53-55
: SIGTERM handler uses unstable reference; off() won’t remove it and this-binding is fragileRegistering with
this.sigterm.bind(this)
creates a new function, butoff("SIGTERM", this.sigterm)
uses a different reference → leak and potential wrongthis
. Use a stable field.Apply:
@@ - private isCompletingRun = false; - private isShuttingDown = false; + private isCompletingRun = false; + private isShuttingDown = false; + private readonly onSigterm = () => { void this.sigterm(); }; @@ - process.on("SIGTERM", this.sigterm.bind(this)); + process.on("SIGTERM", this.onSigterm); @@ - process.off("SIGTERM", this.sigterm); + process.off("SIGTERM", this.onSigterm);Also applies to: 126-127, 859-860
🧹 Nitpick comments (3)
packages/core/src/v3/heartbeats/api.ts (2)
41-47
: Expose allowOverride for controlled re-registrationSome runtimes may need to swap managers (dev ↔ prod). Consider threading allowOverride through to registerGlobal.
- public setGlobalManager(manager: HeartbeatsManager): boolean { - return registerGlobal(API_NAME, manager); - } + public setGlobalManager( + manager: HeartbeatsManager, + { allowOverride = false }: { allowOverride?: boolean } = {} + ): boolean { + return registerGlobal(API_NAME, manager, allowOverride); + }
49-56
: Clarify reset semantics (resets then disables)reset() currently clears the underlying manager and unregisters the API. That’s a bit surprising for a method named reset. Either document the behavior or split into reset() and resetAndDisable().
+ /** + * Resets the current manager state and then disables the global API. + * NOTE: Call setGlobalManager(...) again to re-enable. + */ public reset() { this.#getManager().reset(); this.disable(); }packages/cli-v3/src/entryPoints/managed-run-worker.ts (1)
719-724
: Also sanitize usage heartbeat interval (minor)Mirror the clamp/default logic to avoid overly chatty usage heartbeats in misconfigurations.
- const prodUsageManager = new ProdUsageManager(devUsageManager, { - heartbeatIntervalMs: usageIntervalMs ? parseInt(usageIntervalMs, 10) : undefined, + const prodUsageManager = new ProdUsageManager(devUsageManager, { + heartbeatIntervalMs: (() => { + if (!usageIntervalMs) return undefined; + const n = Number(usageIntervalMs); + return Number.isFinite(n) && n > 0 ? Math.max(1_000, Math.floor(n)) : undefined; + })(),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
packages/cli-v3/src/entryPoints/dev-run-controller.ts
(9 hunks)packages/cli-v3/src/entryPoints/managed-run-worker.ts
(7 hunks)packages/core/src/v3/heartbeats/api.ts
(1 hunks)packages/core/src/v3/heartbeats/manager.ts
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}
: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations
Files:
packages/cli-v3/src/entryPoints/dev-run-controller.ts
packages/core/src/v3/heartbeats/manager.ts
packages/core/src/v3/heartbeats/api.ts
packages/cli-v3/src/entryPoints/managed-run-worker.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
We use zod a lot in packages/core and in the webapp
Files:
packages/core/src/v3/heartbeats/manager.ts
packages/core/src/v3/heartbeats/api.ts
🧠 Learnings (2)
📚 Learning: 2024-10-18T15:41:52.352Z
Learnt from: nicktrn
PR: triggerdotdev/trigger.dev#1418
File: packages/core/src/v3/errors.ts:364-371
Timestamp: 2024-10-18T15:41:52.352Z
Learning: In `packages/core/src/v3/errors.ts`, within the `taskRunErrorEnhancer` function, `error.message` is always defined, so it's safe to directly call `error.message.includes("SIGTERM")` without additional checks.
Applied to files:
packages/cli-v3/src/entryPoints/dev-run-controller.ts
📚 Learning: 2025-08-18T10:07:17.368Z
Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-08-18T10:07:17.368Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Import Trigger.dev APIs from "trigger.dev/sdk/v3" when writing tasks or related utilities
Applied to files:
packages/cli-v3/src/entryPoints/managed-run-worker.ts
🧬 Code graph analysis (4)
packages/cli-v3/src/entryPoints/dev-run-controller.ts (1)
packages/core/src/v3/errors.ts (1)
SuspendedProcessError
(527-533)
packages/core/src/v3/heartbeats/manager.ts (2)
packages/core/src/v3/heartbeats/api.ts (1)
HeartbeatsManager
(70-72)packages/core/src/v3/heartbeats/types.ts (1)
HeartbeatsManager
(1-8)
packages/core/src/v3/heartbeats/api.ts (2)
packages/core/src/v3/heartbeats/types.ts (1)
HeartbeatsManager
(1-8)packages/core/src/v3/utils/globals.ts (3)
registerGlobal
(21-37)unregisterGlobal
(45-51)getGlobal
(39-43)
packages/cli-v3/src/entryPoints/managed-run-worker.ts (5)
packages/core/src/v3/heartbeats/manager.ts (1)
StandardHeartbeatsManager
(5-78)packages/core/src/v3/workers/index.ts (1)
StandardHeartbeatsManager
(32-32)packages/core/src/v3/heartbeats-api.ts (1)
heartbeats
(5-5)packages/trigger-sdk/src/v3/heartbeats.ts (1)
heartbeats
(41-44)internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (1)
id
(1582-1589)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (6)
packages/cli-v3/src/entryPoints/managed-run-worker.ts (3)
561-564
: Good: always stop heartbeat in finallyPrevents leaks if execution throws or is cancelled.
296-301
: Good: reset heartbeats with the rest of the environmentEnsures no stale timers or state across runs.
737-739
: LGTM: event-driven heartbeat wiringListener sends TASK_HEARTBEAT on each tick; aligns with the new manager.
packages/cli-v3/src/entryPoints/dev-run-controller.ts (3)
7-7
: Importing SuspendedProcessError — good usageUsing SuspendedProcessError to distinguish suspend vs. fail paths makes the error flow clearer.
598-599
: isCompletingRun flags — appropriate gatingSetting/resetting around execute/complete provides a simple guard against double-finish races.
Also applies to: 670-671
528-533
: SuspendedProcessError branch — appropriate fast-exitTreating a suspension as a non-error path and finishing without completion submission is correct.
… prod behavior, adding heartbeats.yield utility
…age, add more information to the docs, improve resource monitor and add it to the docs
928396a
to
d1c51c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
♻️ Duplicate comments (7)
packages/core/src/v3/errors.ts (1)
635-640
: Also map the WITH_WAITPOINTS variant to the same troubleshooting link.Without this, users hitting the waitpoints flavor won’t see the guide.
TASK_RUN_STALLED_EXECUTING: { link: { name: "Read our troubleshooting guide", href: links.docs.troubleshooting.stalledExecution, }, }, + TASK_RUN_STALLED_EXECUTING_WITH_WAITPOINTS: { + link: { + name: "Read our troubleshooting guide", + href: links.docs.troubleshooting.stalledExecution, + }, + },packages/cli-v3/src/entryPoints/dev-run-controller.ts (3)
126-127
: SIGTERM listener not removed — use a stable handler referenceThe handler is registered with a bound function but removed with an unbound reference, so the listener leaks.
@@ - process.on("SIGTERM", this.sigterm.bind(this)); + process.on("SIGTERM", this.onSigterm); @@ - process.off("SIGTERM", this.sigterm); + process.off("SIGTERM", this.onSigterm);Add a stable field:
@@ private isShuttingDown = false; + private readonly onSigterm = () => { void this.sigterm(); }
Also applies to: 859-860
360-369
: FINISHED path leaks the borrowed process and leaves listeners installedOn FINISHED you suspend and call shutdown, but shutdownExecution doesn’t return the process to the pool, unsubscribe run notifications, or detach SIGTERM; exitTaskRunProcessWithoutFailingRun also doesn’t guard suspend errors. This causes pool starvation and listener leaks.
@@ case "FINISHED": { if (this.isCompletingRun) { logger.debug("Run is finished but we're completing it, skipping"); return; } await this.exitTaskRunProcessWithoutFailingRun({ flush: true, reason: "already-finished", }); return; } @@ - private async exitTaskRunProcessWithoutFailingRun({ + private async exitTaskRunProcessWithoutFailingRun({ flush, reason, }: { flush: boolean; reason: string; }) { - await this.taskRunProcess?.suspend({ flush }); - - // No services should be left running after this line - let's make sure of it - this.shutdownExecution(`exitTaskRunProcessWithoutFailingRun: ${reason}`); + try { + await this.taskRunProcess?.suspend({ flush }); + } catch (error) { + logger.debug("Failed to suspend task run process", { error }); + } + // No services should be left running after this line - let's make sure of it + await this.shutdownExecution(`exitTaskRunProcessWithoutFailingRun: ${reason}`); } - private shutdownExecution(reason: string) { + private async shutdownExecution(reason: string) { if (this.isShuttingDown) { logger.debug(`[shutdown] ${reason} (already shutting down)`, { newReason: reason, }); return; } logger.debug(`[shutdown] ${reason}`); this.isShuttingDown = true; + // Detach signal handler early + process.off("SIGTERM", this.onSigterm); + this.snapshotPoller.stop(); - this.opts.onFinished(); - this.taskRunProcess?.unsafeDetachEvtHandlers(); + // Unsubscribe if still subscribed + if (this.state.phase === "RUN") { + try { + this.unsubscribeFromRunNotifications(this.state); + } catch (error) { + logger.debug("[shutdown] Failed to unsubscribe from run notifications", { error }); + } + this.state = { phase: "IDLE" }; + } + + // Return process to pool if held + if (this.taskRunProcess) { + try { + const version = this.opts.worker.serverWorker?.version || "unknown"; + await this.opts.taskRunProcessPool.returnProcess(this.taskRunProcess, version); + } catch (error) { + logger.debug("[shutdown] Failed to return task run process to pool", { error }); + } finally { + this.taskRunProcess = undefined; + } + } + + this.taskRunProcess?.unsafeDetachEvtHandlers(); + this.opts.onFinished(); }Also applies to: 792-804, 805-821
622-643
: Wrap heartbeat await in try/catch to avoid unhandled rejectionsNetwork/cancellation can throw and surface as unhandled rejections in the event handler.
- taskRunProcess.onTaskRunHeartbeat.attach(async (runId) => { - if (!this.runFriendlyId || !this.snapshotFriendlyId) { - logger.debug("[DevRunController] Skipping heartbeat, no run ID or snapshot ID"); - return; - } - - logger.debug("[DevRunController] Sending heartbeat"); - - const response = await this.httpClient.dev.heartbeatRun( - this.runFriendlyId, - this.snapshotFriendlyId, - { - cpu: 0, - memory: 0, - } - ); - - if (!response.success) { - logger.debug("[DevRunController] Heartbeat failed", { error: response.error }); - } - }); + taskRunProcess.onTaskRunHeartbeat.attach(async (_runId) => { + try { + if (!this.runFriendlyId || !this.snapshotFriendlyId) { + logger.debug("[DevRunController] Skipping heartbeat, no run ID or snapshot ID"); + return; + } + logger.debug("[DevRunController] Sending heartbeat"); + const response = await this.httpClient.dev.heartbeatRun( + this.runFriendlyId, + this.snapshotFriendlyId, + { cpu: 0, memory: 0 } + ); + if (!response.success) { + logger.debug("[DevRunController] Heartbeat failed", { error: response.error }); + } + } catch (error) { + logger.debug("[DevRunController] Heartbeat threw", { error }); + } + });packages/core/src/v3/heartbeats/manager.ts (3)
11-12
: Sanitize and clamp interval; avoid parameter property for validationPrevents invalid/zero/negative intervals and ensures integer ms.
-export class StandardHeartbeatsManager implements HeartbeatsManager { - private listener: ((id: string) => Promise<void>) | undefined = undefined; - private currentAbortController: AbortController | undefined = undefined; - private lastHeartbeatYieldTime: number | undefined = undefined; - private lastHeartbeatDate: Date | undefined = undefined; - - constructor(private readonly intervalInMs: number) {} +export class StandardHeartbeatsManager implements HeartbeatsManager { + private listener: ((id: string) => Promise<void>) | undefined = undefined; + private currentAbortController: AbortController | undefined = undefined; + private lastHeartbeatYieldTime: number | undefined = undefined; + private lastHeartbeatDate: Date | undefined = undefined; + private readonly intervalInMs: number; + + constructor(intervalInMs: number) { + const n = Math.floor(Number(intervalInMs)); + this.intervalInMs = Number.isFinite(n) && n > 0 ? Math.max(250, n) : 1_000; + }
31-38
: Good: stop previous heartbeat before starting a new onePrevents overlapping loops and controller leaks. LGTM.
1-4
: Make heartbeats isomorphic; remove node:timers/promises dependencyCore should be runtime-agnostic (Node/web/edge). Replace node-only timers with an abortable sleep loop; also update lastHeartbeatDate only on successful sends.
-import { tryCatch } from "../tryCatch.js"; -import { HeartbeatsManager } from "./types.js"; -import { setInterval, setImmediate, setTimeout } from "node:timers/promises"; +import { tryCatch } from "../tryCatch.js"; +import { HeartbeatsManager } from "./types.js"; + +// Isomorphic delay +function delay(ms: number): Promise<void> { + return new Promise((resolve) => setTimeout(resolve, ms)); +} + +// Abortable sleep compatible with Node, browser, and edge runtimes +function abortableSleep(ms: number, signal: AbortSignal): Promise<void> { + return new Promise<void>((resolve, reject) => { + const id: any = setTimeout(resolve, ms); + const onAbort = () => { + clearTimeout(id); + const err = new Error("Aborted"); + // @ts-expect-error: mark for diagnostics + err.name = "AbortError"; + reject(err); + }; + if (signal.aborted) { + onAbort(); + return; + } + signal.addEventListener("abort", onAbort, { once: true }); + }); +} @@ async yield(): Promise<void> { if (!this.lastHeartbeatYieldTime) { return; } - // Only call setImmediate if we haven't yielded in the last interval + // Throttle yielding to at most once per interval if (Date.now() - this.lastHeartbeatYieldTime >= this.intervalInMs) { - // await setImmediate(); - await setTimeout(24); + await delay(24); this.lastHeartbeatYieldTime = Date.now(); } } @@ - private async startHeartbeatLoop(id: string, signal: AbortSignal) { + private async startHeartbeatLoop(id: string, signal: AbortSignal) { try { - for await (const _ of setInterval(this.intervalInMs, undefined, { - signal, - })) { - if (this.listener) { - const [error] = await tryCatch(this.listener(id)); - this.lastHeartbeatDate = new Date(); - - if (error) { - console.error("Failed to send HEARTBEAT message", { error: String(error) }); - } - } - } + while (!signal.aborted) { + await abortableSleep(this.intervalInMs, signal); + if (signal.aborted) break; + if (this.listener) { + const [error] = await tryCatch(this.listener(id)); + if (error) { + console.error("Failed to send HEARTBEAT message", { error }); + } else { + this.lastHeartbeatDate = new Date(); + } + } + } } catch (error) { // Ignore errors as we expect them to be thrown when the heartbeat is stopped // And since we tryCatch inside the loop, we don't need to handle any other errors here } }Also applies to: 17-29, 40-58
🧹 Nitpick comments (12)
packages/trigger-sdk/src/v3/index.ts (1)
18-18
: Narrow the re-export to the named symbol (heartbeats
) — avoidexport *
heartbeats
is a named export in ./heartbeats.js; there is no default export orHeartbeatsAPI
type to re-export, so replace the wildcard with an explicit named export.-export * from "./heartbeats.js"; +export { heartbeats } from "./heartbeats.js";internal-packages/run-engine/src/engine/types.ts (1)
74-74
: Document the new option for clarity.Add a brief JSDoc so consumers understand behavior and default.
heartbeatTimeoutsMs?: Partial<HeartbeatTimeouts>; - treatProductionExecutionStallsAsOOM?: boolean; + /** When true, production EXECUTING stalls are surfaced as TASK_PROCESS_OOM_KILLED instead of stall errors. Default: false. */ + treatProductionExecutionStallsAsOOM?: boolean;packages/core/src/v3/errors.ts (1)
312-316
: Remove stale TODO.The stalled-executing codes are already handled here; the TODO is obsolete.
- //new heartbeat error - //todo case "TASK_RUN_STALLED_EXECUTING": case "TASK_RUN_STALLED_EXECUTING_WITH_WAITPOINTS":internal-packages/run-engine/src/engine/index.ts (2)
217-218
: Use logger instead of console.log for consistency.Route through the configured Logger.
- console.log("✅ Starting run engine worker"); + this.logger.info("✅ Starting run engine worker");
1265-1296
: Messaging tweaks: avoid stale heartbeat interval and align OOM text.
- Don’t hardcode “heartbeats (sent every 30s)”—this may drift.
- Match the OOM message to the pretty error copy for consistency.
- const errorMessage = - latestSnapshot.environmentType === "DEVELOPMENT" - ? `Run timed out after ${timeoutDuration} due to missing heartbeats (sent every 30s). Check if your \`trigger.dev dev\` CLI is still running, or if CPU-heavy work is blocking the main thread.` - : `Run timed out after ${timeoutDuration} due to missing heartbeats (sent every 30s). This typically happens when CPU-heavy work blocks the main thread.`; + const errorMessage = + latestSnapshot.environmentType === "DEVELOPMENT" + ? `Run timed out after ${timeoutDuration} due to missing heartbeats. Check if your \`trigger.dev dev\` CLI is still running, or if CPU-heavy work is blocking the main thread.` + : `Run timed out after ${timeoutDuration} due to missing heartbeats. This typically happens when CPU-heavy work blocks the main thread.`; @@ - ? ({ - type: "INTERNAL_ERROR", - code: "TASK_PROCESS_OOM_KILLED", - message: "Run was terminated due to running out of memory", - } satisfies TaskRunInternalError) + ? ({ + type: "INTERNAL_ERROR", + code: "TASK_PROCESS_OOM_KILLED", + message: + "Your run was terminated due to exceeding the machine's memory limit. Try increasing the machine preset in your task options or replay using a larger machine.", + } satisfies TaskRunInternalError)docs/machines.mdx (7)
33-41
: Table/unit consistency (GB vs GiB)Elsewhere in the doc/code you calculate using 1024-based units (GiB), but the table labels “10GB”. Consider standardizing on either GB (10^9) or GiB (2^30) across the page to avoid confusion.
178-181
: Disk limit unit mismatch and namingDISK_LIMIT_GB is computed with 1024-based math and later displayed as “GiB”. Either rename to DISK_LIMIT_GiB or convert using 1000-based math to align with “GB”.
Also applies to: 716-721
385-388
: PID filter may miss legitimate targetsFiltering to pids > process.pid will ignore older long-lived targets (e.g., services). If the intent is “child-only,” document it; otherwise, remove the filter or make it configurable.
Also applies to: 452-454
198-200
: Default verbosity may be too chattyverbose = true includes full heapStats in every payload. Consider defaulting to false and enabling via config to reduce log volume and PII risk in logs.
Apply this diff:
- this.verbose = true; + this.verbose = config.compactLogging ? false : false; + // consider an explicit `verbose` flag in ResourceMonitorConfigAlso applies to: 749-754
896-900
: Use extensionless import in TS exampleTo avoid TS/ESM confusion, drop the .js extension in the TS code sample.
Apply this diff:
-import { ResourceMonitor } from "../resourceMonitor.js"; +import { ResourceMonitor } from "../resourceMonitor";
590-601
: Label formatting: parsing stringified percents back to numbersYou stringify percents in payload, then parseFloat them to build labels. Consider keeping numeric fields numeric in payload and only formatting at the edge to simplify consumption.
Also applies to: 606-617
71-75
: Platform expectationsResourceMonitor relies on Linux tools (/proc, du, ps). Add a one-liner note that it targets Trigger’s Linux runtime; behavior on macOS/Windows will differ.
Also applies to: 851-887, 902-905
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
docs/images/machines-resource-monitor-ffmpeg.png
is excluded by!**/*.png
docs/images/machines-resource-monitor-logs.png
is excluded by!**/*.png
references/hello-world/src/resourceMonitor.ts
is excluded by!references/**
references/hello-world/src/trigger/cpuHeavy.ts
is excluded by!references/**
📒 Files selected for processing (23)
apps/webapp/app/env.server.ts
(2 hunks)apps/webapp/app/v3/runEngine.server.ts
(1 hunks)docs/machines.mdx
(5 hunks)docs/troubleshooting.mdx
(1 hunks)internal-packages/run-engine/src/engine/errors.ts
(2 hunks)internal-packages/run-engine/src/engine/index.ts
(3 hunks)internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
(7 hunks)internal-packages/run-engine/src/engine/types.ts
(1 hunks)packages/cli-v3/src/entryPoints/dev-run-controller.ts
(9 hunks)packages/cli-v3/src/entryPoints/dev-run-worker.ts
(7 hunks)packages/cli-v3/src/entryPoints/managed-run-worker.ts
(7 hunks)packages/cli-v3/src/executions/taskRunProcess.ts
(3 hunks)packages/core/src/v3/errors.ts
(2 hunks)packages/core/src/v3/heartbeats-api.ts
(1 hunks)packages/core/src/v3/heartbeats/api.ts
(1 hunks)packages/core/src/v3/heartbeats/manager.ts
(1 hunks)packages/core/src/v3/heartbeats/types.ts
(1 hunks)packages/core/src/v3/index.ts
(1 hunks)packages/core/src/v3/links.ts
(1 hunks)packages/core/src/v3/utils/globals.ts
(2 hunks)packages/core/src/v3/workers/index.ts
(1 hunks)packages/trigger-sdk/src/v3/heartbeats.ts
(1 hunks)packages/trigger-sdk/src/v3/index.ts
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/troubleshooting.mdx
🚧 Files skipped from review as they are similar to previous changes (13)
- packages/core/src/v3/heartbeats/types.ts
- packages/core/src/v3/index.ts
- packages/core/src/v3/workers/index.ts
- packages/core/src/v3/heartbeats-api.ts
- packages/core/src/v3/utils/globals.ts
- internal-packages/run-engine/src/engine/errors.ts
- packages/trigger-sdk/src/v3/heartbeats.ts
- packages/core/src/v3/heartbeats/api.ts
- packages/cli-v3/src/entryPoints/managed-run-worker.ts
- packages/cli-v3/src/entryPoints/dev-run-worker.ts
- internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
- packages/cli-v3/src/executions/taskRunProcess.ts
- apps/webapp/app/env.server.ts
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}
: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations
Files:
internal-packages/run-engine/src/engine/types.ts
packages/core/src/v3/links.ts
apps/webapp/app/v3/runEngine.server.ts
packages/trigger-sdk/src/v3/index.ts
internal-packages/run-engine/src/engine/index.ts
packages/core/src/v3/errors.ts
packages/cli-v3/src/entryPoints/dev-run-controller.ts
packages/core/src/v3/heartbeats/manager.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
We use zod a lot in packages/core and in the webapp
Files:
packages/core/src/v3/links.ts
apps/webapp/app/v3/runEngine.server.ts
packages/core/src/v3/errors.ts
packages/core/src/v3/heartbeats/manager.ts
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
When importing from @trigger.dev/core in the webapp, never import the root package path; always use one of the documented subpath exports from @trigger.dev/core’s package.json
Files:
apps/webapp/app/v3/runEngine.server.ts
{apps/webapp/app/**/*.server.{ts,tsx},apps/webapp/app/routes/**/*.ts}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Access environment variables only via the env export from app/env.server.ts; do not reference process.env directly
Files:
apps/webapp/app/v3/runEngine.server.ts
apps/webapp/app/**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Modules intended for test consumption under apps/webapp/app/**/*.ts must not read environment variables; accept configuration via options instead
Files:
apps/webapp/app/v3/runEngine.server.ts
🧠 Learnings (5)
📚 Learning: 2025-08-18T10:07:17.368Z
Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-08-18T10:07:17.368Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Import Trigger.dev APIs from "trigger.dev/sdk/v3" when writing tasks or related utilities
Applied to files:
packages/trigger-sdk/src/v3/index.ts
📚 Learning: 2025-08-29T10:06:49.293Z
Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-08-29T10:06:49.293Z
Learning: Prefer Run Engine 2.0 via internal/run-engine; avoid extending legacy run engine code
Applied to files:
internal-packages/run-engine/src/engine/index.ts
📚 Learning: 2024-10-18T15:41:52.352Z
Learnt from: nicktrn
PR: triggerdotdev/trigger.dev#1418
File: packages/core/src/v3/errors.ts:364-371
Timestamp: 2024-10-18T15:41:52.352Z
Learning: In `packages/core/src/v3/errors.ts`, within the `taskRunErrorEnhancer` function, `error.message` is always defined, so it's safe to directly call `error.message.includes("SIGTERM")` without additional checks.
Applied to files:
packages/core/src/v3/errors.ts
packages/cli-v3/src/entryPoints/dev-run-controller.ts
📚 Learning: 2025-08-18T10:07:17.368Z
Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-08-18T10:07:17.368Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Define tasks using task({ id, run, ... }) with a unique id per project
Applied to files:
docs/machines.mdx
📚 Learning: 2025-08-18T10:07:17.368Z
Learnt from: CR
PR: triggerdotdev/trigger.dev#0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-08-18T10:07:17.368Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : When triggering a task multiple times in a loop from inside another task, use batchTrigger()/batchTriggerAndWait() instead of per-item trigger() calls
Applied to files:
docs/machines.mdx
🧬 Code graph analysis (4)
internal-packages/run-engine/src/engine/index.ts (2)
packages/core/src/v3/index.ts (1)
formatDurationMilliseconds
(35-35)packages/core/src/v3/schemas/common.ts (2)
TaskRunInternalError
(153-193)TaskRunInternalError
(195-195)
packages/core/src/v3/errors.ts (1)
packages/core/src/v3/links.ts (1)
links
(1-33)
packages/cli-v3/src/entryPoints/dev-run-controller.ts (2)
packages/core/src/v3/errors.ts (1)
SuspendedProcessError
(527-533)internal-packages/run-engine/src/engine/index.ts (1)
runId
(1165-1435)
packages/core/src/v3/heartbeats/manager.ts (3)
packages/core/src/v3/heartbeats/api.ts (1)
HeartbeatsManager
(70-72)packages/core/src/v3/heartbeats/types.ts (1)
HeartbeatsManager
(1-8)internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (1)
id
(1585-1592)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (10)
packages/trigger-sdk/src/v3/index.ts (1)
18-18
: Re-exporting heartbeats via v3 entrypoint looks goodThis aligns with our guidance to import from "trigger.dev/sdk/v3" and keeps the API discoverable from the main SDK surface.
packages/core/src/v3/links.ts (1)
17-18
: Troubleshooting link addition looks good.The new stalledExecution doc link is consistent with existing patterns.
packages/core/src/v3/errors.ts (2)
572-574
: Making pretty error messages optional is appropriate.Callsites already guard against undefined messages; this enables link-only guidance.
579-592
: OOM guidance copy reads well and points to Machines docs.No issues from me.
apps/webapp/app/v3/runEngine.server.ts (1)
18-19
: Env-plumbed flag LGTM.Uses env wrapper and follows the existing “1” sentinel pattern.
packages/cli-v3/src/entryPoints/dev-run-controller.ts (3)
528-533
: SuspendedProcessError handling is correctGracefully finishing on suspension avoids false failures. LGTM.
620-621
: Good: detach stale handlers before reattachingPrevents duplicate listeners when reusing a pooled process. LGTM.
53-54
: State guards addedisCompletingRun/isShuttingDown help avoid re-entrancy. LGTM.
docs/machines.mdx (2)
9-18
: Example looks goodSetting a larger machine preset inline in the task snippet is clear and matches the surrounding guidance.
508-523
: Sanity-check memory scaling mathYou scale ps %MEM by totalMemory/machineMemoryBytes. This assumes ps %MEM is relative to system RAM—which is true—but edge cases (cgroups, container limits) can skew results. Add a note or guard when machineMemoryBytes > totalMemory or when scaling factor is extreme.
This PR improves the experience when receiving certain types of OOM issues in production tasks, because currently we only really handle OOMs when the task run (worker) process heap exceeds the
--max-old-space-size
setting. If theRUN_ENGINE_TREAT_PRODUCTION_EXECUTION_STALLS_AS_OOM=1
env var is set, heartbeat stalls when runs are executed will be treated as OOM errors and our "OOM Retry" mechanism will be activated, if the task is configured to retry OOM errors on larger machines. I've also updated the copy of the OOM error to be more accurate:This PR also will align dev run heartbeat stalls with production runs, instead of just cancelling runs. Dev runs that stall while executing will now have the following error:
If stalls are indeed happening because of CPU-heavy work and heartbeats are unable to be sent because of a blocked event loop, we have a new utility to allow users to "yield" and allow the run the continue executing: